Skip to main content

Milvus

Querying Overview

One way to specify a Milvus query in Qarbine is to use a JSON-like structure. Below is an example to retrieve up to 10 matches from the quick_setup collection.

{
"collectionName": "quick_setup",
"annsField" : "vector",
"limit": 3,
"data": [ [0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295] ]
}

Milvus can store complex JSON documents. Below is an example row.

{
'title': 'The Reported Mortality Rate of Coronavirus Is Not Important',
'title_vector': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486],
'article_meta': {
'link': 'https://medium.com/swlh/the-important-369989c8d912',
'reading_time': 13,
'publication': 'The Startup',
'claps': 1100,
'responses': 18,
'tag_1': [4, 15, 6, 7, 9],
'tag_2': [ [2, 3, 4], [7, 8, 9], [5, 6, 1] ]
}
}

For details on using such JSON data see https://milvus.io/docs/use-json-fields.md.

Milvus offers a variety of parameters to control how the comparison is done and what properties are returned. You can query Milvus using one or a combination of a semantic (i.e. vector) search and a lexical (i.e. scalar) search. The former is for ‘similar’ oriented searches while the latter uses more traditional matching techniques (similar to a SQL WHERE clause).

There are a variety of search types to meet different requirements:

Type Usage
Basic Includes single-vector search, bulk-vector search, partition search, and search with specified output fields.
Filtered Applies filtering criteria based on scalar fields to refine search results.
Range Finds vectors within a specific distance range from the query vector.
Grouping Groups search results based on a specific field to ensure diversity in the results.

For more information see https://milvus.io/docs/single-vector-search.md.

Qarbine provides 2 query options for Milvus:

  1. JSON query structure and
  1. SQL-like request.

The latter is an interface that resides above the JSON query structure API. It can be more convenient to specify criteria using SQL than the more complex JSON structure approach. Most of the Milvus features are still accessible when using the SQL syntax so there is no loss of Milvus features or complex middleware in play.

Vector Query Prerequisites

For vector related queries, prior to using Qarbine’s embeddings(...) macro function or the SQL-like query function nearText(...), the Qarbine Administrator must first configure “AI Assistant(s)”. The AI Assistants provide access to various popular Generative AI services and are referenced using an alias. Check with your Qarbine administrator for which ones are available and their proper use. For example, when using dynamic query vector embeddings, the model used by the AI Assistant must be compatible with the one used to generate the original embedding values in the database.

Query Specification Options

Primary Options

The primary specification options are described below.

Field Description
collectionNameThe Milvus collection to perform the query upon.
partitionNameThe partition names to perform the search.
annsFieldThe field upon which there is a vector index. Different vector fields can represent different aspects, different embedding models or even different modalities of data characterizing the same entity.
dataThe list of embeddings from which to determine similarity.
nearTextThe value is a string with the similarity phrase. For example “dracula movies”. An embedding value for the nearText argument will be obtained by Qarbine using a configured Qarbine AI Assistant. When using this option the model used to insert the Milvus data must correspond to the one used by the Qarbine AI Assistant.
outputFieldsThe list of field names to return for each matching element. An example is [“color”]. By default any vector field is not returned in order to reduce the answer set size. If you want the vector values then explicitly list it here. To get all the fields and the vector field use something like the following,outputFields: [ '*', 'title_vector']
searchParamsA JSON argument controlling how Milvus performs its similarity matching.An example is { metricType: 'L2', params : { radius: 1 } }.
filterThe SQL-like filtering expression. See the section below for more information.
offsetIt indicates how many objects of the answer set to skip over as part of the returned answer set.
limitThe maximum number of matches to return.

When there is no embedding or nearText parameter then a Milvus “query” is performed. The filter parameter is required. See this link for details https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Query.md.

When data or nearText is provided then a Milvus “search” is performed. The annsField is required. See this link for details https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Search.md.

Filters

Filters act like SQL WHERE clauses to determine which elements match. An element may be similar to any given vector, but not pass the filter rules. Below is a specification using a filter.

{
collectionName: 'medium_articles',
limit: 2,
filter: 'reading_time < 10',
outputFields: [ 'title', 'publication']
}

Details on Milvus filtering can be found at https://milvus.io/docs/boolean.md.

Qarbine SQL Interface

SQL Oriented Filtering

Recall that Milvus supports semantic (i.e. vector) search and a lexical (i.e. scalar/matching) search. The use of the specification structure described above can be a bit verbose and cumbersome though. To improve readability and productivity when authoring Milvus retrievals, Qarbine provides a SQL oriented option. For example, here is an example of a vector search retrieval for the quick_setup collection.

{
collectionName: quick_setup,
outputFields: [ *],
annsField: 'vector',
data : [ [ 0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295] ],
filter: "color in ['pink_8682', 'red_9392']",
limit: 10
}

The Qarbine SQL equivalent is simply

select * 
from quick_setup
where nearVector (0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295)
and withOption('annsField', "vector")
and color in ('pink_8682', 'red_9392')
limit 10

Note that a SQL list is enclosed in parentheses while one in the JSON specification is enclosed in brackets. That is a subtle nuance across the SQL and JSON syntax standards.

Qarbine’s Milvus integration extends to the filtering features as well. Qarbine is your co-pilot translating SQL-oriented queries into their lower level specification equivalents. In some cases the Qarbine Data Source will have literally just the SQL statement above and nothing more.

The mapping of the standard SQL clauses to their Milvus equivalents is described below.

Clause Description
SELECTThe names of the fields to return. Specifying “*” indicates all object fields . This does not set the outputFields field in which case Milvus returns all of the properties. You can also reference meta properties by prefixing them with the case -sensitive “meta_”. Here are some examples.
SELECT * …
SELECT title, rating_value …
SELECT *,meta_distance …
SELECT title, rating_value, meta_distance …

Including the named vector field in the SELECT list overrides the default behavior of not including it in the answer set.
FROMThe name of the Milvus collection. This value sets the “collectionName” field in the query specification.
WHERESee the discussion below. The effect is to set the “filter” field of the query specification.
ORDER BYThe sorting rules in “column asc|desc” format. Sorting is done by Qarbine after Milvus returns the answer set. This sets the “sortBySql” field of the query specification.
OFFSETIndicates where in the the return objects start return objects. This sets the “offset” field of the query specification.
LIMITIndicates at most how many elements to return. This sets the “limit” field of the query specification.

Bear in mind that some combinations of query fields may not make sense in the Milvus world.

The WHERE clause criteria can be in a variety of traditional SQL forms and may include Qarbine specific functions described below. For example,

select * from Movies where nearText("dracula")

results in a query specification with these fields,

collectionName: "Movies",
nearText: "dracula"

Some additional Qarbine defined SQL functions are listed below.

nearText(aPhrase)
nearVector(number1, number n …)
vector = (number 1, number n ...)A different way of expressing nearVector()

The table below describes their use.

Function Description
nearVectorThis clause is removed from the WHERE criteria and its list of numbers argument set into the “nearVector” field of the query specification.
nearTextThis clause is removed from the WHERE criteria and its argument set into the “nearText” field of the query specification. The nearText argument can be used by query.nearText(), hybrid.nearText(), or generate.nearText(). Indicate which operation is wanted in the query specification.
withOptionPass in the specification field name and the value to set. This clause is removed from the WHERE clause. This is commonly used to set the annsField.
withOptionsSet several specification fields at once. The format is withOptions(key1, value1, keyN, valueN).The key argument may use dot notation when setting the inner value of a component object.

The nearText() parameters are described below.

Parameter Description
aPhraseA quoted value for which a vector is first obtained by Qarbine and then passed along as the raw vector value.
aiAssistantAliasRefers to an AI Assistant alias as configured by the Qarbine administrator. This is important to consider so that the model used to generate the raw vector value is compatible with that used to create the stored values.

There are techniques to blend the ease of using SQL along with the powerful features of Milvus within a Qarbine JSON specification object. The table below lists the fields that drive this definition.

JSON Field Description
sqlThe SQL statement can affect all of the primary options listed above.
sqlWhereThe string can affect all of the primary options listed above except for outputFields and collection.
sortBySqlThe ORDER BY clause specifying how to sort the Milvus answer set.

Here is a simple example of combining the SQL and query specification approaches. The effective result is the same as the example query specification above.

{
sql: "select * from quick_setup",
data : [ [ 0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295] ],
}

Reviewing the Generated Specification

You can enter criteria of the form “EXPLAIN SELECT ….” to have the SQL statement processed and have the returned answer set be the underlying query specification. For example enter and run

explain
select *
from quick_setup
where nearVector (0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295)
and withOption('annsField', "vector")
and color in ('pink_8682', 'red_9392')
limit 10

Select the single result element and its details are shown to the right.

  

Click the “+” to expand all of the JSON object fields.

  

A convenient way of specifying this is to have “explain” on the first line and the rest of your SQL on the next lines.

explain
select *
from quick_setup
where nearVector (0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295)

Then simply “comment out” the first line when not in use

// explain
select *
from quick_setup
where nearVector (0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295)

You can also use “explain: true” in the JSON query specification for similar information.

Another way to get the specification is to press ALT and click   . Below is a sample result.

  

Any “explain SELECT” or “explain: true” takes precedence over the ALT-click interaction.

Caveats

Qarbine caches collection descriptions to optimize its operations. To force a redetermination of a collection’s structure use

forceDescription: true

or, when using SQL,

withOption(‘forceDescription’, true)

Qarbine Virtual Queries

There are a few convenience queries which are mainly DBA oriented. These queries are recognized by the Qarbine driver and provide common database information.

Query Description
list databasesReturn a list of databases.
list collectionsReturn a list of Milvus collections.
describe collectionsProvide details on all of the collections. This may take a while depending on your database structure.
describe collection COLLECTIONProvide details on the given collection.

See the “DBA Productivity” section of the online documentation for more details.